Introduction to Time Series
\(\hspace{0.3cm}\) More articles: \(\hspace{0.1cm}\) Estadistica4all
\(\hspace{0.3cm}\) Author: \(\hspace{0.1cm}\) Fabio Scielzo Ortiz
\(\hspace{0.3cm}\) If you use this article, please quote it !!
\(\hspace{0.5cm}\) Scielzo Ortiz, F. (2023). Introduction to Time Series. http://estadistica4all.com/Articulos/Intervalos-de-confianza.html
It’s recommended to open the article on a computer or tablet.
1 Introduction to stochastic processes
1.1 Stochastic processes
Let \(\hspace{0.1cm}\mathcal{X}_t\hspace{0.1cm}\) be a random variable (r.v.), for each \(\hspace{0.1cm}t\in T\)
\(\hspace{0.25cm}\) A stochastic processes is a set of random variables \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm}\right\rbrace\hspace{0.1cm}\) such that \(\hspace{0.1cm}\hspace{0.1cm}\mathcal{X}_t \in S \subset \mathbb{R}\)
\(\hspace{0.25cm}\) where:
\(T\hspace{0.1cm}\) is called parameter space and is the set of indices of the random variables that define the stochastic process. \(\\[0.35cm]\)
\(S\hspace{0.1cm}\) is called states space and is the variation field of the random variables that define the stochastic process. \(\\[0.35cm]\)
We will say that \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm} : \hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.1cm}\) is a stochastic process with parameter space \(\hspace{0.1cm}T\hspace{0.1cm}\) and states space \(\hspace{0.1cm}S\) \(\\[0.5cm]\)
Observation:
\(T\hspace{0.1cm}\) is generally interpreted as moments or periods of time, because one of the most important applications of stochastic processes is time series modeling.
Therefore:
\(X_t\hspace{0.1cm}\) is a random variable ussually used to model the state of a system at time moment \(\hspace{0.06cm}t\hspace{0.06cm}\), or to model a variable of interest at the moment or period \(\hspace{0.06cm}t\).
1.2 Discrete stochastic process
\(\hspace{0.25cm}\) \(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.15cm}\) is a discrete stochastic process if \(\hspace{0.15cm}T\subset \lbrace 0,1,2,... \rbrace\)
1.3 Continuous stochastic process
\(\hspace{0.25cm}\) \(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.15cm}\) is a continuous stochastic process if \(\hspace{0.15cm}T\subset [0, \infty)\)
1.4 Types of stochastic processes
1.4.1 Independent process
\(\hspace{0.25cm}\)\(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.1cm}\) is a independient stochastic process if the random variables that define the process are independient.
1.4.2 Markov process
\(\hspace{0.25cm}\) A discrete stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \in \lbrace 0,1,2,... \hspace{0.1cm} \rbrace\hspace{0.2cm}\) is a Markov process if: \(\\[0.15cm]\)
\[P(\mathcal{X}_{n+1} = x_{n+1}\hspace{0.15cm} |\hspace{0.15cm} \mathcal{X}_0 = x_0 ,..., \mathcal{X}_n =x_n) \hspace{0.1cm}=\hspace{0.1cm} P(\mathcal{X}_{n+1} = x_{n+1}\hspace{0.15cm} |\hspace{0.15cm} \mathcal{X}_n = x_n)\] \(\\[0.15cm]\)
\(\hspace{0.25cm}\) where: \(\hspace{0.2cm} x_{t} \in S \hspace{0.2cm},\hspace{0.2cm} \forall\hspace{0.1cm} t \in \lbrace 0,1,...,n+1\rbrace\) \(\\[0.35cm]\)
This property is known as the memoryless Markov property. Because it implies that the future state of the system, \(\mathcal{X}_{n+1}\) , only depends on the present state \(x_n\) and does not depend on past states \(x_0,...,x_{n- 1}\)
1.4.3 Process of independent increments
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.1cm}\) is a independent increments process if:
\(\hspace{0.25cm}\) For all set of times \(\hspace{0.1cm}t_1,t_2,t_3\geq 0\hspace{0.13cm}\) such that \(\hspace{0.1cm}t_1 < t_2 < t_3\)
\(\hspace{0.25cm}\) \(\mathcal{X}_{t_2} - \mathcal{X}_{t_1} \hspace{0.1cm} , \hspace{0.1cm} \mathcal{X}_{t_3} - \mathcal{X}_{t_2}\hspace{0.1cm}\) are independents.
This means that the displacements of the process in the time intervals \(\hspace{0.1cm}[t_1 , t_2) , [t_2 , t_3)\hspace{0.1cm}\) are independent of each other, for all \(\hspace{0.1cm }0 \leq t_1 < t_2 < t_3\).
1.4.4 Strictly stationary process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.2cm}\) is a strictly stationary process if:
\(\hspace{0.25cm}\) For all \(t \geq 0\) , the probability distribution of \(\mathcal{X}_{t}\) is the same as that of \(\mathcal{X}_{t+h}\) , for all \(h>0\)
Therefore, for all set of times \(\hspace{0.1cm}t_1 , t_2,...,t_n\)
\((\mathcal{X}_{t_1}, \mathcal{X}_{t_2},\dots ,\mathcal{X}_{t_n} )\hspace{0.1cm}\) is identically distributed as \(\hspace{0.1cm}(\mathcal{X}_{t_1+h}, \mathcal{X}_{t_2+h},\dots ,\mathcal{X}_{t_n+h} )\)
1.4.5 Process with stationary increments
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.1cm}\) is a process with stationary increments if:
\(\hspace{0.25cm}\) For all pair of times \(\hspace{0.1cm}t_1,t_2 > 0\hspace{0.1cm}\) such that \(\hspace{0.1cm}t_1 < t_2\)
\(\hspace{0.25cm}\) \(\mathcal{X}_{t_2} - \mathcal{X}_{t_1}\hspace{0.1cm}\) and \(\hspace{0.1cm}\mathcal{X}_{t_2 + h} - \mathcal{X}_{t_1 + h}\hspace{0.1cm}\) are independents, for any \(\hspace{0.1cm}h>0\)
1.4.6 Martingalas process
\(\hspace{0.25cm}\) A discrete stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \in \lbrace 0,1,2,... \hspace{0.1cm} \rbrace\hspace{0.1cm}\) is a Martingalas process if: \(\\[0.15cm]\)
\[E\left[\hspace{0.1cm}\mathcal{X}_{n+1} | X_0 = x_0 ,..., X_n = x_n\hspace{0.1cm} \right] \hspace{0.1cm} = \hspace{0.1cm} x_n\] \(\\[0.15cm]\)
\(\hspace{0.25cm}\) where: \(\hspace{0.2cm} x_{t} \in S \hspace{0.2cm},\hspace{0.2cm} \forall\hspace{0.1cm} t \in \lbrace 0,1,...,n+1\rbrace\) \(\\[0.35cm]\)
This property is known as Martingalas property, and it implies that the expected value of the sistym in the future \(\hspace{0.1cm}n+1\hspace{0.1cm}\) is the value of the system in the present \(\hspace{0.1cm}x_n\). In mean the system doesn´t change of the state observed in the last moment.
This property is known as Martingale property, and it implies that the expected value of the sistym in the future \(\hspace{0.1cm}n+1\hspace{0.1cm}\) is the value of the system in the present \(\hspace {0.1cm}x_n\). So, in mean the system doesn´t change of the state observed in the last moment.
1.4.7 Levy process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.15cm}\) is a Levy process if is a process of independents and stationaries increments.
The Poisson and Brownian process are examples of Levy process.
1.4.8 Gaussian Process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.15cm}\) is a Gaussian process if:
\(\hspace{0.25cm}\) For all set of times \(\hspace{0.1cm}t_1,...,t_n \geq 0\)
\[(\mathcal{X}_{t_1}, \mathcal{X}_{t_2},...,\mathcal{X}_{t_n}) \sim NM(\mu , \Sigma)\]
\(\hspace{0.25cm}\) where:
\(\hspace{0.25cm}\) \(NM(\mu , \Sigma)\hspace{0.1cm}\) denote the multivariate Normal probability distribution with mean vector \(\hspace{0.1cm}\mu\hspace{0.1cm}\) y covariance matrix \(\hspace{0.1cm}\Sigma\) \(\\[0.4cm]\)
The dynamic phenomena that we observe in a time series can grouped into two classes:
- The first are those that take stable values in time around a constant level, without showing a long term increasing or decreasing trend. These processes are called stationary.
Examples of those are the average yearly temperatures in a region or the propotion of births corresponding to males.
- A second class of processes are the non-stationary processes, which are those that can show trend, seasonality and other evolutionary effects over time.
Examples of those are the yearly income of a country, company sales or energy demand. These are series that evolve over time with more or less stable trends.
In practice, the classification of a series as stationary or not depends on the period of observation, since the series can be stable in a short period and non-stationary in a longer one.
2 Time series
\(\hspace{0.25cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.1cm}\)
\(\hspace{0.25cm}\) Given a sample of one observation \(\hspace{0.06cm}y_t\hspace{0.06cm}\) of each random variable \(\hspace{0.06cm}\mathcal{Y}_t\hspace{0.06cm}\) of the process, for \(\hspace{0.06cm}t \in T=\lbrace 1,2,...,n \rbrace\). \(\\[0.5cm]\)
- \(Y_t = \left( y_1, y_2, ...,y_n \right)^t \hspace{0.12cm}\) is a time serie.
\(\hspace{0.25cm}\) where:
\(\hspace{0.35cm}\) \(y_t\hspace{0.06cm}\) could be interpreted as the value observed of the variable \(\hspace{0.06cm}\mathcal{Y}\hspace{0.06cm}\) at the time or period \(\hspace{0.06cm}t\). \(\\[0.15cm]\)
Observations:
\(y_t \in \mathbb{R}\hspace{0.06cm}\) is a realization of the random variable \(\mathcal{Y}_t\) \(\\[0.35cm]\)
A time series is a realization of a stochastic process. The time series is considered a result or trajectory of the stochastic process. \(\\[0.35cm]\)
A time series can be defined as a vector of data points ordered in time. Where the data is equally spaced in time, namely, between each data point there is the same time space, such as a week, a month, a trimester, a quarter …
The process is characterized by the join probability distribution of the random variables \(\hspace{0.1cm}\Bigl\{ \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_k \hspace{0.06cm} \Bigl\} \hspace{0.1cm}\), namely, is characterized by the join density or probability function \(\hspace{0.06cm}f_{\mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_k}\).
This distribution is called finite-dimensional distribution of the process. We say that we know the probabilistic structure of the stochastic process when we know that join distribution, which determine the distribution of any subset of the variables and, in particular, the marginal distribution of each variable.
3 Mean function
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
The mean function \(\mu_t\) of the process is defined as:
\[\mu_t = E\left[\mathcal{X}_t\right]\]
for \(t \in \lbrace 1,2,...,k \rbrace \\\)
Observaciones:
An important particular case, due to its simplicity, arises when all the variables have the same mean and thus the mean function is a constant. The realizations of the process show no trend and we say that the process is stable in the mean.
If, on the contrary, the means change over time, the observations at dierent moments will reveal that change.
On many occasions we only have one realization of the stochastic process and we have to deduce from that whether the mean function of the process is, or is not, constant over time.
4 Variance function
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
The variance function \(\sigma^2_t\) of the process is defined as:
\[\sigma^2_t = Var\left[\mathcal{X}_t\right]\]
for \(t \in \lbrace 1,2,...,k \rbrace \\\)
We say that the process is stable in the variance if the variability is constant over time.
A process can be stable in the mean but not in the variance and vice versa.
5 Autocovariance function
The structure of linear dependence between random variables is represented by the covariance and correlation functions.
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
The autocavariance function \(\gamma_{t , t+h}\) of the process is defined as:
\[\gamma_{t , t+h} = Cov(\mathcal{X}_t , \mathcal{X}_{t+h}) = E\left[ (\mathcal{X}_t - \mu_t)\cdot (\mathcal{X}_{t+h} - \mu_{t+h}) \right]\]
for \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.1cm}\) and \(\hspace{0.1cm} h\in \lbrace 1,2,... \rbrace \\\)
In particular, we have
\[\gamma_{t , t} = \sigma_t^2\]
The autocovariances have dimensions, the squares of the series, thus it is not advisable to use them for comparing series measured in dierent units.
6 Autocorrelation function
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
The autocorrelation function \(\rho_{t , t+h}\) of the process is defined as:
\[\rho_{t , t+h} = \dfrac{\gamma_{t , t+h}}{\sqrt{\sigma_t^2 \cdot \sigma_{t+h}^2}}\]
for \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.1cm}\) and \(\hspace{0.1cm} h\in \lbrace 1,2,... \rbrace \\\)
In particular, we have
\[\rho_{t , t} = 1\]
It is interesting to notice the differences between conditional distributions and the marginal distributions.
The marginal distribution of \(\mathcal{X}_t\) represents what we know about a variable, without knowing anything about its trajectory until time \(t\).
The conditional distribution of \(\mathcal{X}_t\) given \(\mathcal{X}_{t-1}\),…,\(\mathcal{X}_{t-r}\) represents what we know about a variable when we know the k previous values of the process.
In time series conditional distributions are of greater interest than marginal ones because they define the predictions that we can make about the future knowing the past.
7 Stationary processes
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
A stochastic process is strictly stationary if:
the probability distribution of \(\mathcal{X}_{t}\) is the same as that of \(\mathcal{X}_{t+h}\)
for all \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.1cm}\) and \(\hspace{0.1cm} h \in \in \lbrace 1,2,... \rbrace\).
Therefore, for all set of times \(\hspace{0.1cm}t_1 , t_2,...,t_n\)
\((\mathcal{X}_{t_1}, \mathcal{X}_{t_2},\dots ,\mathcal{X}_{t_n} )\hspace{0.1cm}\) is identically distributed as \(\hspace{0.1cm}(\mathcal{X}_{t_1+h}, \mathcal{X}_{t_2+h},\dots ,\mathcal{X}_{t_n+h} )\)
Strict stationarity is a very strong condition, since to prove it we must have the joint distributions for any set of variables in the process. A weaker property, but one which is easier to prove, is weak stationarity.
A stochastic process is weakly stationary if:
\(\mu_t = \mu = cte , \forall t \in \lbrace 1,2,...,k \rbrace\)
\(\sigma_t^2 = \sigma = cte , , \forall t \in \lbrace 1,2,...,k \rbrace\)
\(\gamma_{t , t + h} = Cov(\mathcal{X}_t,\mathcal{X}_{t+h}) E[(\mathcal{X}_t - \mu)\cdot (\mathcal{X}_{t+h} - \mu)] = \gamma(h) , \forall h \in \lbrace 0 , \pm 1 , \pm 2 ,... \rbrace\)
The first two conditions indicate that the mean and variance are constant.
The third indicates that the covariance between two variables depends only on their separation.
In a stationary process the autocovariances and autocorrelations depend only on the lag between the variables and, in particular, the relationship between \(\mathcal{X}_t\) and \(\mathcal{X}_{t+h}\) , is always equal to the relationship between \(\mathcal{X}_t\) and \(\mathcal{X}_{t-h}\) .
As a result, in stationary processes:
\(\gamma_{t , t + h} = \gamma_{t + r , t + h + r} = \gamma(h) , \forall r \in \lbrace 0 , \pm 1 , \pm 2 ,... \rbrace\)
\[\rho_{t, t+h} = \dfrac{\gamma_{t , t + h}}{\sqrt{\sigma_t^2 \cdot \sigma_{t+h}^2}} = \dfrac{\gamma(h)}{\sqrt{\sigma^2 \cdot \sigma^2}} = \dfrac{\gamma(h)}{\sigma^2} = \dfrac{\gamma(h)}{\gamma(0)} = \rho(h)\]
Where:
\(\gamma(0) = \sigma^2\)
8 Visualization of time series in Python
Throughout this article we will use a time series on sales of a company.
First of all, we load some of the libraries that we are to use:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as snssns.set_theme()
sns.set(rc={'figure.figsize':(20,9)})We load the data-set:
Time_Series_1 = pd.read_csv('Time_Series_1.csv')Time_Series_1| CODARTIC | CODIREGG | IMPLINEA | IMPVENTA | CODTAVEN | QCANTDEC | NUMTARJE | |
|---|---|---|---|---|---|---|---|
| 0 | 10254201003449 | 1 | 59.99 | 59.99 | 2022-06-21001069602336387 | 1.0 | 6.008330e+18 |
| 1 | 10234141001560 | 1 | 12.72 | 12.72 | 2021-08-03001002100821700 | 1.0 | NaN |
| 2 | 10865290000019 | 4 | 11.20 | 11.20 | 2022-08-21001009006714129 | 1.0 | 6.008330e+18 |
| 3 | 10004136025419 | 1 | 48.97 | 48.97 | 2022-07-09001003602372694 | 1.0 | NaN |
| 4 | 10073131011804 | 1 | 22.95 | 22.95 | 2022-05-14001091203649080 | 1.0 | 6.008330e+18 |
| … | … | … | … | … | … | … | … |
| 704350 | 10271412016641 | 1 | 18.90 | 18.90 | 2022-02-10001003601776985 | 1.0 | NaN |
| 704351 | 10411524000184 | 2 | -29.99 | -29.99 | 2022-06-02001009803937030 | -1.0 | NaN |
| 704352 | 10092532091505 | 1 | 75.00 | 75.00 | 2022-05-19001003602477163 | 1.0 | 6.008330e+18 |
| 704353 | 10805731000015 | 2 | -12.59 | -10.70 | 2022-02-17001009008963198 | -1.0 | 6.008330e+18 |
| 704354 | 10084472093096 | 1 | 6.95 | 6.95 | 2021-06-06001048902492799 | 1.0 | NaN |
704355 rows × 7 columns
The default periodicity of this time series is daily.
We can group a time series by different periods, as hours, days, weeks, months, quarter, years…
Concretely we will group this time series by day, week, month and quarter.
We can get date column as follows:
Time_Series_1['Fecha'] = Time_Series_1['CODTAVEN'].str[0:10]We have to convert date column to date format:
Time_Series_1['Fecha'] = pd.to_datetime(Time_Series_1['Fecha'])We can create the columns Day, Week, Month, Quarter and Year as follows:
Time_Series_1['Dia'] = Time_Series_1['Fecha'].dt.day
Time_Series_1['Semana'] = Time_Series_1['Fecha'].dt.week
Time_Series_1['Mes'] = Time_Series_1['Fecha'].dt.month
Time_Series_1['Trimestre'] = Time_Series_1['Fecha'].dt.quarter
Time_Series_1['Año'] = Time_Series_1['Fecha'].dt.yearWe select the columns with which we are going to work. IMPVENTA will be the response variable, namely, the variable we want to predict.
Time_Series_1 = Time_Series_1.loc[: , ['Fecha', 'Dia', 'Semana', 'Mes', 'Trimestre', 'Año', 'IMPVENTA']]Time_Series_1| Fecha | Dia | Semana | Mes | Trimestre | Año | IMPVENTA | |
|---|---|---|---|---|---|---|---|
| 0 | 2022-06-21 | 21 | 25 | 6 | 2 | 2022 | 59.99 |
| 1 | 2021-08-03 | 3 | 31 | 8 | 3 | 2021 | 12.72 |
| 2 | 2022-08-21 | 21 | 33 | 8 | 3 | 2022 | 11.20 |
| 3 | 2022-07-09 | 9 | 27 | 7 | 3 | 2022 | 48.97 |
| 4 | 2022-05-14 | 14 | 19 | 5 | 2 | 2022 | 22.95 |
| … | … | … | … | … | … | … | … |
| 704350 | 2022-02-10 | 10 | 6 | 2 | 1 | 2022 | 18.90 |
| 704351 | 2022-06-02 | 2 | 22 | 6 | 2 | 2022 | -29.99 |
| 704352 | 2022-05-19 | 19 | 20 | 5 | 2 | 2022 | 75.00 |
| 704353 | 2022-02-17 | 17 | 7 | 2 | 1 | 2022 | -10.70 |
| 704354 | 2021-06-06 | 6 | 22 | 6 | 2 | 2021 | 6.95 |
704355 rows × 7 columns
8.1 Visualization of Monthly Time Series
Monthly_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes'])['IMPVENTA'].sum().reset_index(drop=False)Monthly_Time_Series_1| Año | Mes | IMPVENTA | |
|---|---|---|---|
| 0 | 2021 | 6 | 992891.50 |
| 1 | 2021 | 7 | 982142.75 |
| 2 | 2021 | 8 | 885289.16 |
| 3 | 2021 | 9 | 878563.14 |
| 4 | 2021 | 10 | 923552.45 |
| 5 | 2021 | 11 | 1527486.61 |
| 6 | 2021 | 12 | 1438722.72 |
| 7 | 2022 | 1 | 1910816.46 |
| 8 | 2022 | 2 | 1317501.34 |
| 9 | 2022 | 3 | 1524652.47 |
| 10 | 2022 | 4 | 2060509.71 |
| 11 | 2022 | 5 | 2316733.47 |
| 12 | 2022 | 6 | 2872219.23 |
| 13 | 2022 | 7 | 2731251.02 |
| 14 | 2022 | 8 | 1844222.02 |
| 15 | 2022 | 9 | 1068975.95 |
| 16 | 2022 | 10 | 895735.29 |
| 17 | 2022 | 11 | 1544511.61 |
| 18 | 2022 | 12 | 1272814.69 |
This step is necessary to create Month-Year column, which will be used to carry out the plot.
Monthly_Time_Series_1['Año'] = Monthly_Time_Series_1['Año'].astype('string')
Monthly_Time_Series_1['Mes'] = Monthly_Time_Series_1['Mes'].astype('string')
Monthly_Time_Series_1['Mes-Año'] = Monthly_Time_Series_1[['Mes', 'Año']].agg('-'.join, axis=1)Monthly_Time_Series_1| Año | Mes | IMPVENTA | Mes-Año | |
|---|---|---|---|---|
| 0 | 2021 | 6 | 992891.50 | 6-2021 |
| 1 | 2021 | 7 | 982142.75 | 7-2021 |
| 2 | 2021 | 8 | 885289.16 | 8-2021 |
| 3 | 2021 | 9 | 878563.14 | 9-2021 |
| 4 | 2021 | 10 | 923552.45 | 10-2021 |
| 5 | 2021 | 11 | 1527486.61 | 11-2021 |
| 6 | 2021 | 12 | 1438722.72 | 12-2021 |
| 7 | 2022 | 1 | 1910816.46 | 1-2022 |
| 8 | 2022 | 2 | 1317501.34 | 2-2022 |
| 9 | 2022 | 3 | 1524652.47 | 3-2022 |
| 10 | 2022 | 4 | 2060509.71 | 4-2022 |
| 11 | 2022 | 5 | 2316733.47 | 5-2022 |
| 12 | 2022 | 6 | 2872219.23 | 6-2022 |
| 13 | 2022 | 7 | 2731251.02 | 7-2022 |
| 14 | 2022 | 8 | 1844222.02 | 8-2022 |
| 15 | 2022 | 9 | 1068975.95 | 9-2022 |
| 16 | 2022 | 10 | 895735.29 | 10-2022 |
| 17 | 2022 | 11 | 1544511.61 | 11-2022 |
| 18 | 2022 | 12 | 1272814.69 | 12-2022 |
We create the plot:
fig, ax = plt.subplots()
p=sns.lineplot(x="Mes-Año", y="IMPVENTA", data=Monthly_Time_Series_1 , color='red')
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Monthly Time Series", fontsize = 17)
fig.savefig('p1.jpg', format='jpg', dpi=1200)
plt.show()8.2 Visualization of Daily Time Series
Daily_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes','Dia'])['IMPVENTA'].sum().reset_index(drop=False)Daily_Time_Series_1| Año | Mes | Dia | IMPVENTA | |
|---|---|---|---|---|
| 0 | 2021 | 6 | 1 | 26423.78 |
| 1 | 2021 | 6 | 2 | 18752.01 |
| 2 | 2021 | 6 | 3 | 22812.84 |
| 3 | 2021 | 6 | 4 | 107889.11 |
| 4 | 2021 | 6 | 5 | 136714.44 |
| … | … | … | … | … |
| 574 | 2022 | 12 | 27 | 64542.49 |
| 575 | 2022 | 12 | 28 | 59913.84 |
| 576 | 2022 | 12 | 29 | 53815.43 |
| 577 | 2022 | 12 | 30 | 52695.32 |
| 578 | 2022 | 12 | 31 | 39739.67 |
579 rows × 4 columns
Daily_Time_Series_1['Año'] = Daily_Time_Series_1['Año'].astype('string')
Daily_Time_Series_1['Mes'] = Daily_Time_Series_1['Mes'].astype('string')
Daily_Time_Series_1['Dia'] = Daily_Time_Series_1['Dia'].astype('string')
Daily_Time_Series_1['Dia-Mes-Año'] = Daily_Time_Series_1[['Dia', 'Mes', 'Año']].agg('-'.join, axis=1)Daily_Time_Series_1| Año | Mes | Dia | IMPVENTA | Dia-Mes-Año | |
|---|---|---|---|---|---|
| 0 | 2021 | 6 | 1 | 26423.78 | 1-6-2021 |
| 1 | 2021 | 6 | 2 | 18752.01 | 2-6-2021 |
| 2 | 2021 | 6 | 3 | 22812.84 | 3-6-2021 |
| 3 | 2021 | 6 | 4 | 107889.11 | 4-6-2021 |
| 4 | 2021 | 6 | 5 | 136714.44 | 5-6-2021 |
| … | … | … | … | … | … |
| 574 | 2022 | 12 | 27 | 64542.49 | 27-12-2022 |
| 575 | 2022 | 12 | 28 | 59913.84 | 28-12-2022 |
| 576 | 2022 | 12 | 29 | 53815.43 | 29-12-2022 |
| 577 | 2022 | 12 | 30 | 52695.32 | 30-12-2022 |
| 578 | 2022 | 12 | 31 | 39739.67 | 31-12-2022 |
579 rows × 5 columns
fig, ax = plt.subplots()
p=sns.lineplot(x="Dia-Mes-Año", y="IMPVENTA", data=Daily_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Daily_Time_Series_1) , 40))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Daily Time Series", fontsize = 20)
fig.savefig('p2.jpg', format='jpg', dpi=1200)
plt.show()8.3 Visualization of Weekly Time Series
Weekly_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes','Semana'])['IMPVENTA'].sum().reset_index(drop=False)Weekly_Time_Series_1['Año'] = Weekly_Time_Series_1['Año'].astype('string')
Weekly_Time_Series_1['Mes'] = Weekly_Time_Series_1['Mes'].astype('string')
Weekly_Time_Series_1['Semana'] = Weekly_Time_Series_1['Semana'].astype('string')
Weekly_Time_Series_1['Semana-Mes-Año'] = Weekly_Time_Series_1[['Semana', 'Mes', 'Año']].agg('-'.join, axis=1)fig, ax = plt.subplots()
p=sns.lineplot(x="Semana-Mes-Año", y="IMPVENTA", data=Weekly_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Weekly_Time_Series_1) , 5))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Weekly Time Series", fontsize = 17)
fig.savefig('p3.jpg', format='jpg', dpi=1200)
plt.show()8.4 Visualization of Quarter Time Series
Quarter_Time_Series_1 = Time_Series_1.groupby(['Año', 'Trimestre'])['IMPVENTA'].sum().reset_index(drop=False)Quarter_Time_Series_1| Año | Trimestre | IMPVENTA | |
|---|---|---|---|
| 0 | 2021 | 2 | 992891.50 |
| 1 | 2021 | 3 | 2745995.05 |
| 2 | 2021 | 4 | 3889761.78 |
| 3 | 2022 | 1 | 4752970.27 |
| 4 | 2022 | 2 | 7249462.41 |
| 5 | 2022 | 3 | 5644448.99 |
| 6 | 2022 | 4 | 3713061.59 |
Quarter_Time_Series_1['Año'] = Quarter_Time_Series_1['Año'].astype('string')
Quarter_Time_Series_1['Trimestre'] = Quarter_Time_Series_1['Trimestre'].astype('string')
Quarter_Time_Series_1['Trimestre-Año'] = Quarter_Time_Series_1[['Trimestre', 'Año']].agg('-'.join, axis=1)fig, ax = plt.subplots()
p=sns.lineplot(x="Trimestre-Año", y="IMPVENTA", data=Quarter_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Quarter_Time_Series_1) , 1))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Quarter Time Series", fontsize = 17)
fig.savefig('p4.jpg', format='jpg', dpi=1200)
plt.show()